Déjà vu - A study of duplicate citations in Medline

نویسندگان

  • Mounir Errami
  • Justin M. Hicks
  • Wayne G. Fisher
  • David Trusty
  • Jonathan D. Wren
  • Tara C. Long
  • Harold R. Garner
چکیده

MOTIVATION Duplicate publication impacts the quality of the scientific corpus, has been difficult to detect, and studies this far have been limited in scope and size. Using text similarity searches, we were able to identify signatures of duplicate citations among a body of abstracts. RESULTS A sample of 62,213 Medline citations was examined and a database of manually verified duplicate citations was created to study author publication behavior. We found that 0.04% of the citations with no shared authors were highly similar and are thus potential cases of plagiarism. 1.35% with shared authors were sufficiently similar to be considered a duplicate. Extrapolating, this would correspond to 3500 and 117,500 duplicate citations in total, respectively. AVAILABILITY eTBLAST, an automated citation matching tool, and Déjà vu, the duplicate citation database, are freely available at http://invention.swmed.edu/ and http://spore.swmed.edu/dejavu

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing Medline citations using modified N-grams

OBJECTIVE We aim to identify duplicate pairs of Medline citations, particularly when the documents are not identical but contain similar information. MATERIALS AND METHODS Duplicate pairs of citations are identified by comparing word n-grams in pairs of documents. N-grams are modified using two approaches which take account of the fact that the document may have been altered. These are: (1) d...

متن کامل

Déjà vu: a database of highly similar citations in the scientific literature

In the scientific research community, plagiarism and covert multiple publications of the same data are considered unacceptable because they undermine the public confidence in the scientific integrity. Yet, little has been done to help authors and editors to identify highly similar citations, which sometimes may represent cases of unethical duplication. For this reason, we have made available Dé...

متن کامل

Identifying duplicate publications: primum non nocere.

Errami and Garner published in the January 24, 2008 issue of Nature an article entitled “A Tale of Two Citations” that deals with the timely subject of publication of duplicate papers (1 ). With the increase in the number of scientific journals and the ongoing pressure to publish in them, inappropriate and unethical practices such as plagiarism, unauthorized “cosubmission” of a paper to two or ...

متن کامل

Identifying duplicate content using statistically improbable phrases

MOTIVATION Document similarity metrics such as PubMed's 'Find related articles' feature, which have been primarily used to identify studies with similar topics, can now also be used to detect duplicated or potentially plagiarized papers within literature reference databases. However, the CPU-intensive nature of document comparison has limited MEDLINE text similarity studies to the comparison of...

متن کامل

Copeptin response to clinical maximal exercise tests.

has been renamed “Update,” as some users expressed concern over the word “Duplicate.” With these changes in mind, we would like to emphasize that the ultimate purpose of the Déjà Vu database is to maintain the integrity of biomedical literature, a goal that can be achieved only by a thorough and accurate interpretation of the information contained within. We therefore extend to both the editors...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 24 2  شماره 

صفحات  -

تاریخ انتشار 2008